New Filter method for categorical variables’ selection
نویسندگان
چکیده
It is worth noting that the variable-selection process has become an increasingly exciting challenge, given the dramatic increase in the size of databases and the number of variables to be explored and modelized. Therefore, several strategies and methods have been developed with the aim of selecting the minimum number of variables while preserving as much information for the interest variable of the system to be modelized (variable to predict). In this work, we will present a novel Filter method useful for selecting variables, distinct for its joint application of both simple as well as multivariate analyses to select variables. In the first place, we will deal with the major prevailing strategies and methods already underway. Secondly, we will expose our new method and establish a comparison of its achieved results with those of the existing methods. The experiments have been implemented on two different databases, namely, a cardiac diagnosis disease labeled "Spect Heart", and a car diagnosis, called "Car Diagnosis 2". As for the ultimate section, it will bear the conclusion as well some highlights for future research perspectives and potential horizons.
منابع مشابه
A New Hybrid Framework for Filter based Feature Selection using Information Gain and Symmetric Uncertainty (TECHNICAL NOTE)
Feature selection is a pre-processing technique used for eliminating the irrelevant and redundant features which results in enhancing the performance of the classifiers. When a dataset contains more irrelevant and redundant features, it fails to increase the accuracy and also reduces the performance of the classifiers. To avoid them, this paper presents a new hybrid feature selection method usi...
متن کاملAsymptotic Properties of the Joint Neighborhood Selection Method for Estimating Categorical Markov Networks
The joint neighborhood selection method was proposed as a fast algorithm to estimate parameters of a Markov network for binary variables and identify the underlying graphical model. This paper shows that this method leads to consistent parameter estimation and model selection under high-dimensional asymptotics. We also apply the algorithm to the voting records of US senators to illustrate the k...
متن کاملCovariance and PCA for Categorical Variables
Covariances from categorical variables are defined using a regular simplex expression for categories. The method follows the variance definition by Gini, and it gives the covariance as a solution of simultaneous equations. The calculated results give reasonable values for test data. A method of principal component analysis (RS-PCA) is also proposed using regular simplex expressions, which allow...
متن کاملDeveloping a Filter-Wrapper Feature Selection Method and its Application in Dimension Reduction of Gen Expression
Nowadays, increasing the volume of data and the number of attributes in the dataset has reduced the accuracy of the learning algorithm and the computational complexity. A dimensionality reduction method is a feature selection method, which is done through filtering and wrapping. The wrapper methods are more accurate than filter ones but perform faster and have a less computational burden. With ...
متن کاملSFLA Based Gene Selection Approach for Improving Cancer Classification Accuracy
In this paper, we propose a new gene selection algorithm based on Shuffled Frog Leaping Algorithm that is called SFLA-FS. The proposed algorithm is used for improving cancer classification accuracy. Most of the biological datasets such as cancer datasets have a large number of genes and few samples. However, most of these genes are not usable in some tasks for example in cancer classification....
متن کامل